MCAT Math Retrieval System for NTCIR-12 MathIR Task

نویسندگان

  • Giovanni Yoko Kristianto
  • Goran Topic
  • Akiko Aizawa
چکیده

This paper describes the participation of our MCAT search system in the NTCIR-12 MathIR Task. We introduce three granularity levels of textual information, new approach for generating dependency graph of math expressions, score normalization, cold-start weights, and unification. We find that these modules, except the cold-start weights, have a very good impact on the search performance of our system. The use of dependency graph significantly improves precision of our system, i.e., up to 24.52% and 104.20% relative improvements in the Main and Simto subtasks of the arXiv task, respectively. In addition, the implementation of unification delivers up to 2.90% and 57.14% precision improvements in the Main and Simto subtasks, respectively. Overall, our best submission achieves P@5 of 0.5448 in the Main subtask and 0.5500 in the Simto subtask. In the Wikipedia task, our system also performs well at the MathWikiFormula subtask. At the MathWiki subtask, however, due to a problem with handling queries formed as questions that contain many stop words, our system finishes second.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Math Indexer and Searcher under the Hood: Fine-tuning Query Expansion and Unification Strategies

This paper summarizes the experience of Math Information Retrieval team of Masaryk University (MIRMU) with the NTCIR-12 MathIR arXiv Main Task and its subtasks. We based our approach on the MIaS system. Based on NTCIR-11 Math-2 Task relevance judgements, we developed an evaluation platform. Using this platform we rigorously evaluated combinations of new features and picked the most promising on...

متن کامل

Tangent-3 at the NTCIR-12 MathIR Task

We present the math-aware search engine Tangent-3 and report its results for the NTCIR-12 MathIR task. Tangent uses a federated search over two indices: 1) a TF-IDF textual search engine (Solr), and 2) a query-by-expression engine. We use an inverted index to store math expressions using pairs of symbols extracted from a Symbol Layout Tree representation built from Presentation MathML. We use a...

متن کامل

Exploring the One-brain Barrier: A Manual Contribution to the NTCIR-12 MathIR Task

This paper compares the search capabilities of a single human brain supported by the text search built into Wikipedia with state-of-the-art math search systems. To achieve this, we compare results of manual Wikipedia searches with the aggregated and assessed results of all systems participating in the NTCIR-12 MathIR Wikipedia Task. For 26 of the 30 topics, the average relevance score of our ma...

متن کامل

The Math Retrieval System of ICST for NTCIR-12 MathIR Task

This paper is the summarized experiences of ICST team in the NTCIR-12 MathIR main tasks (ArXiv and Wikipedia main task). Our approach is based on keyword, structure and importance of formulae in a document. A novel hybrid indexing and matching model is proposed to support exact and fuzzing matching. In this hybrid model, both keyword and structure information of formulae are taken into consider...

متن کامل

A Document Retrieval System for Math Queries

We present and analyze the results of our Math search system in the MathIR tasks in the NTCIR-12 Information Retrieval challenge. The Math search engine in the paper utilizes the co-occurrence finding technique of LDA and doc2vec to bring more contextual search. Additionally, it uses common patterns to improve the search output. To combine various scoring algorithms, it uses hybrid ranking mech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016